Adjusting for Confounding with Text Matching∗

نویسندگان

  • Margaret E. Roberts
  • Brandon M. Stewart
  • Richard A. Nielsen
چکیده

We identify situations in which conditioning on text can address confounding in observational studies. We argue that a matching approach is particularly well-suited to this task, but existing matching methods are ill-equipped to handle high-dimensional text data. Our proposed solution is to estimate a low-dimensional summary of the text covariates and condition on this summary via matching. We propose several specific methods for doing so and weigh their merits. We illustrate the importance of conditioning on text to address confounding with three applications: the effects of censorship on Chinese bloggers, the effect of perceptions of author gender on citation counts in academia, and the effect of Usama bin Laden’s death on the popularity of his writings. ∗We thank the following for helpful comments and suggestions on this work: David Blei, Naoki Egami, Chris Felton, James Fowler, Justin Grimmer, Erin Hartman, Chad Hazlett, Seth Hill, Kosuke Imai, Rebecca Johnson, Gary King, Adeline Lo, Will Lowe, Chris Lucas, Ian Lundberg, Walter Mebane, David Mimno, Jennifer Pan, Marc Ratkovick, Matt Salganik, Caroline Tolbert, Simone Zhang, audiences at the Princeton Text Analysis Workshop, Princeton Politics Methods Workshop, Microsoft Research, Text as Data Conference, the Political Methodology Society and the Visions in Methodology conference, and some tremendously helpful anonymous reviewers. We especially thank Dustin Tingley for numerous insightful conversations on the connections between STM and causal inference. Dan Maliniak, Ryan Powers, and Barbara Walter graciously supplied data and replication code for the gender and citations study. Crimson Hexagon provided data for the study of blogs in China. The JSTOR Data for Research program provided academic journal data for the international relations application. This research supported, in part, by The Eunice Kennedy Shriver National Institute of Child Health & Human Development under grant P2-CHD047879 to the Office of Population Research at Princeton University. The research was also supported by grants from the National Science Foundation RIDIR program, award numbers 1738411 and 1738288. †Assistant Professor, Department of Political Science, University of California, San Diego, Social Sciences Building 301, 9500 Gilman Drive, #0521, La Jolla, CA 92093-0521, 360-921-3540, [email protected], MargaretRoberts.net ‡Assistant Professor, Department of Sociology, Princeton University, 149 Wallace Hall, Princeton, NJ 08544, 757-636-0956, [email protected], brandonstewart.org §Associate Professor, Department of Political Science, Massachusetts Institute for Technology, 77 Massachusetts Avenue, E53 Room 455, Cambridge, MA, 02139, 617-254-4823, [email protected], http://www.mit.edu/ rnielsen/research.htm ¶The first draft of this paper was circulated in the summer of 2015 under the title “Matching Methods for High-Dimensional Data with Applications to Text”

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Introduction to Instrumental Variables

Instrumental variables (IVs) are used to control for confounding and measurement error in observational studies. They allow for the possibility of making causal inferences with observational data. Like propensity scores, IVs can adjust for both observed and unobserved confounding effects. Other methods of adjusting for confounding effects, which include stratification, matching and multiple reg...

متن کامل

Head to head comparison of the propensity score and the high-dimensional propensity score matching methods.

BACKGROUND Comparative performance of the traditional propensity score (PS) and high-dimensional propensity score (hdPS) methods in the adjustment for confounding by indication remains unclear. We aimed to identify which method provided the best adjustment for confounding by indication within the context of the risk of diabetes among patients exposed to moderate versus high potency statins. M...

متن کامل

Propensity score methods to adjust for confounding in assessing treatment effects: bias and precision

There is an increasing interest in the use of propensity score (PS) methods for confounding control, with generally three ways of estimating adjusted treatment effects in pharmacoepidemiological studies: 1) stratification on PS, 2) matching on PS and 3) using PS as a covariate. To assess bias and precision of different methods, we conducted simulations in three scenarios: 1) treatment had no ef...

متن کامل

Effectiveness of triple therapy with direct-acting antivirals for hepatitis C genotype 1 infection: application of propensity score matching in a national HCV treatment registry

BACKGROUND Observational studies are used to measure the effectiveness of an intervention in non-experimental, real world scenarios at the population level and are recognised as an important component of the evidence pyramid. Such data can be accrued through prospective cohort studies and a patient registry is a proven method for this type of study. The national hepatitis C (HCV) registry was e...

متن کامل

A Novel Assisted History Matching Workflow and its Application in a Full Field Reservoir Simulation Model

The significant increase in using reservoir simulation models poses significant challenges in the design and calibration of models. Moreover, conventional model calibration, history matching, is usually performed using a trial and error process of adjusting model parameters until a satisfactory match is obtained. In addition, history matching is an inverse problem, and hence it may have non-uni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018